Data Set Pagesize Clustering Merging
نویسندگان
چکیده
if the structure to be mapped is a DAG. We presented heuristics to handle the above cases. Finally we presented a performance study that bears out our analytical results, and shows that the optimal clustering and Smart-BFS techniques perform much better than pre-order clustering on path length measures, and only a little worse on pre-order traversal. Our techniques are likely to be of importance for creating special purpose access structures on disk, and can reduce eeort signiicantly as compared to designing new disk access structures. Our technique are also likely to be of importance in object-oriented databases, which make creation of linked structures on disk very easy; in particular, our techniques will make it easy to get good performance from the access structures. The programmer will not have to worry either about where to map nodes, or how to convert long skinny structures to short fat structures. Acknowledgements We would like to thank P. Venkatachalam of CSRE, IIT Bombay, for providing the data sets on which we ran our tests, and P.P.S. Narayan for implementing the in-memory quad-tree code. inheritance and structure semantics for eeec-tive clustering and buuering in an object oriented dbms. A comparison of spatial query processing techniques for native and parameter spaces. shown that with pre-order merging, the cost of pre-order traversal is exactly equal to the number of pages. To summarize, the optimal clustering with either of the merge techniques, and Smart-BFS both perform well across all the metrics we considered. Both can be computed by a simple linear time algorithm. The optimal clustering has the important beneet that it will never generate a clustering with a worst case height greater than Smart-BFS. It is not hard to generate data sets where Smart-BFS generates a bad clustering. One such data set can be constructed to having a collection of N balanced binary trees each of which has as many nodes as will t in a page, and to link them up with the root of each tree but the rst being the rightmost descendant of the previous tree. If the roots of all trees t in a page, optimal clustering gives a height of 2, while Smart-BFS gives a height of N. 7 Discussion An alternative to the deenition of external path length of a path P under a mapping M in Section 2 would be to use the number of distinct pages in the mapping …
منابع مشابه
Merging Similarity and Trust Based Social Networks to Enhance the Accuracy of Trust-Aware Recommender Systems
In recent years, collaborative filtering (CF) methods are important and widely accepted techniques are available for recommender systems. One of these techniques is user based that produces useful recommendations based on the similarity by the ratings of likeminded users. However, these systems suffer from several inherent shortcomings such as data sparsity and cold start problems. With the dev...
متن کاملImplementation of Multiple Pagesize Support in HP-UX
To reduce performance degradation from Translation Lookaside Bu er (TLB) misses without signi cant increase in TLB size, most modern processors implement TLBs that support multiple pagesizes. For example, Hewlett-Packard's PA-8000 processor allows 8 hardware pagesizes, in multiples of four, ranging from 4 Kbytes to 64 Mbytes. In implementing multiple pagesize support in HP-UX, we chose to creat...
متن کاملAutomatic Ontology Merging by Hierarchical Clustering and Inference Mechanisms
One of the core challenges for current landscape of ontology based research is to develop efficient ontology merging algorithms which can resolve the mismatches with no or minimum human intervention, and generate automatic global merged ontology on-the-fly to fulfil the needs of automated enterprise business applications and mediation based data warehousing. This paper presents our approach of ...
متن کاملA scalable framework for cluster ensembles
An ensemble of clustering solutions or partitions may be generated for a number of reasons. If the data set is very large, clustering may be done on tractable size disjoint subsets. The data may be distributed at different sites for which a distributed clustering solution with a final merging of partitions is a natural fit. In this paper, two new approaches to combining partitions, represented ...
متن کاملClustering by intersection-merging
We propose Interse tion-Merging (IM), a wrapper algorithm for model-based lustering. The algorithm takes a set of lusterings obtained e.g. by EM, breaks down the lusterings into sub lusters via an interse tion step, and then agglomerates them via a merging step. We introdu e two versions of merging: greedy (standard IM) and by simulated annealing (IMSA). Experiments on several data sets show th...
متن کامل